Synthesis of initial (/s/-) stop-liquid clusters using HLsyn

نویسنده

  • David R. Williams
چکیده

This paper describes synthesis of English syllable-initial stopliquid and /s/-stop-liquid clusters using the HLsyn speech synthesis program. The articulo-acoustic parameters of HLsyn permit efficient synthesis of most consonant types; the parameter specifications also capture important generalizations about how related sets of consonants are produced. Here, we discuss settings of a small number of parameters that permit synthesis of 60 different phonetic sequences. 1. THE HL SYNTHESIS APPROACH A primary motivation for the HL synthesis approach is to combine the simplicity of control that characterizes articulatory approaches to synthesis with the accuracy and computational efficiency of traditional formant synthesis [1, 2]. This hybrid approach employs a small set of high-level (HL) parameters to construct an articulo-acoustic utterance specification which is then transformed by means of a set of physiologicallyand acoustically-motivated mapping relations into a specification in terms of the larger set of lower-level (LL) acoustic parameters needed to control a KLSYN88 formant synthesizer [3]. In effect, the HLsyn synthesis system provides an articulatory interface to a formant synthesizer. 1.1. Functions of the HLsyn parameters Ten user-settable parameters are included in the HLsyn synthesis system. The functions of these parameters can be described in terms of three broad classes: 1. Class 1 parameters control the first four natural frequencies of the vocal tract (f1, f2, f3, f4); these parameters specify acoustically the vocal tract configuration and slow movements of articulators. The f0 parameter specifies the fundamental frequency. 2. Class 2 parameters control cross-sectional areas of local constrictions formed by the lips (al) and the tongue tip/blade (ab). They specify the fast movements of primary articulators that rapidly decrease/increase airflow within the oral tract. 3. Class 3 parameters control cross-sectional areas of the glottal orifice (ag) and velopharyngeal port (an) and the pharyngeal volume (ue). These parameters specify opening/closing movements of the glottis and velum and active expansion or contraction of the pharynx. 1.2. HLsyn mapping relations After the HL parameter values have been specified, the first step in determining values for the LL parameters is to calculate the pressures and flows at the supraglottal and glottal orifices using an aerodynamic model [4]. In addition to the Class 2 and 3 parameter values, inputs to the model include agx (the glottal orifice area as modified by supraglottal forces) and acx (the smallest current supraglottal constriction area). The output of the model is an estimate of the intraoral pressure (Pm) which, along with the orifice areas and a constant subglottal pressure (Ps) value, provides the basis for computing the LL source amplitudes AV, AH and AF. Other settings and modifications of the LL parameters result from values of HL parameters specified by the user. In general, the Class 1 parameters are mapped directly to their corresponding LL parameters when the glottal area is modal and the velum is closed. An increased glottal area agx affects the LL formant bandwidths and the values of OQ and TL. The presence of voicing in the synthesis signal is conditional on agx (AV = 0 when agx > 15 mm). Place-specific filtering of the frication is determined from a look-up table when AF > 40 based on the values of f2 and f3. The HL parameter f1, the first natural resonance, plays several important roles in the synthesis specification. When a class 2 parameter specifies a local labial (al) or alveolar (ab) constriction, f1 is modified (f1c) to reflect the fact that the constriction is currently controlling the acoustic properties of the vocal tract. The value of f1c is approximated as the lowest frequency of a Helmholtz resonator with constriction area acx and with a constriction length and pre-constriction volume that are determined by the place of articulation. On the other hand, f1 can also be used to specify a tongue dorsum (e.g., velar) constriction (acd), in which case acx directly reflects the value of f1. [The parameter f1 also plays an important role in the synthesis of nasals. This role and other mapping relations that are operative when the velopharyngeal port is open are not discussed here.] 1.3. Mapping relations for liquids In the current version of HLsyn, the method for synthesizing the transfer function associated with liquids (and glides) is only an approximation. When produced with a glottal source (e.g., for liquids following a voiced stop), these sonorants are synthesized simply by specifying the time course of the formants, using default formant bandwidths. This method neglects two factors: (1) that certain formant bandwidths may be signficantly widened due to increased acoustic losses in the vocal tract, and (2) that a vocal tract constriction can affect the glottal source, resulting in decreases in amplitude and increases in OQ and TL. Because the production of lateral and retroflex consonants can result in a relatively constricted airway, turbulence can be generated at the constriction when the glottis is open and airflow is sufficiently high. The relationship between the distinctive pattern of formants for these consonants (for males: 3502700 for /l/) and constriction size (acl) is modeled as a Helmholtz resonator. When acl is sufficiently small and its value is also the smallest current oral constriction (acx), the noise source is shaped by A3F, reflecting the fact that the natural frequency of the cavity in front of the constriction is always the third formant. 2. /s/-STOP SYNTHESIS In the following two sections, the parameter settings needed to synthesize all /s/-stop-liquid, voiceless stop-liquid and voiced stop-liquid clusters before the three vowels /i, a, ε/ are discussed. First, we examine the oral and glottal constriction and formant parameter settings for the stop and preceding (optional) fricative. 2.1. Oral constriction parameter settings Oral Constriction (ac )

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Development of rules for controlling the HLsyn speech synthesizer

In this paper we describe the development of rules to drive a quasi-articulatory speech synthesizer, HLsyn. HLsyn has 13 parameters, which are mapped to the parameters of a formant synthesizer. Its small number of parameters combined with the computational simplicity of a formant synthesizer make it a good basis for a text-to-speech system. An overview of the rule-driven system, called VHLsyn, ...

متن کامل

Speech Synthesis for Urdu Vowels Using Hlsyn

This paper tries to give the brief overview of different kinds of speech synthesis systems (formant, concatenative and articulatory). General steps, which are involved in the synthesis, are discussed. Moreover, the Klatt synthesizer is also discussed in some detail. This paper also includes the synthesis of Urdu oral vowels (, , æ, , i, e, , o, , , ) using High Level synthesizer (HLSyn)....

متن کامل

PROCSY: A hybrid approach to high-quality formant synthesis using HLSyn

procsy is a hybrid method of automatically producing naturalsounding formant-based synthetic speech from an existing speech signal by using copy-synthesis and estimated articulatory trajectories as input to the HLsyn synthesizer. The purpose is to allow controlled manipulation of selected acoustic parameters. Parameters for HLsyn are derived from prosodically parsed and labelled speech les in t...

متن کامل

Gestural Overlap of Stop- Consonant Sequences

This study used an analysis-by-synthesis approach to discover possible principles governing the coordination of oral and laryngeal articulators in the production of English stop-consonant sequences. Recorded utterances containing stop-consonant sequences were analyzed acoustically, with focus on formant movements, closure durations, release bursts, and spectrum shape at low frequencies. The res...

متن کامل

Gestural overlap and C-center in selected French consonant clusters

Inter-consonantal cohesion in French word-initial CC clusters is investigated in light of recent proposals of gestural coordination. Specifically, the timing of lip and tongue movements of C1/l/ and C1/n/ productions, with C1 being one of the consonants /p, f, k/, of two speakers were studied using electromagnetic articulography (EMA). In French, C/l/ clusters occur frequently in word-initial p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996